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1. Introduction 

This paper deals with a severely ill-posed inverse problem which comes from quantum 
optics. Quantum optics is a branch of quantum mechanics which studies physical 
systems at the atomic and subatomic scales. As the language used by physicist^f] differs 
from the one that is used by statisticians, we start with general notions on quantum 
mechanics. The interested reader can get further acquaintance with quantum concepts 
through the textbooks or the review articles [U El El II]- 

1.1. Physical background 

1.1.1. Quantum mechanics In quantum mechanics, the quantum state of a system is 
a mathematical object which encompasses all the information about the system. The 
most common representation of a quantum state is an operator p on a complex Hilbert 
space "H (called the space of states) satisfying the three following conditions: 

(i) Self adjoint: p = p*, where p* is the adjoint of p. 

(ii) Positive: p > 0, or equivalently (if), pip) > for all ip G H. 

(iii) Trace one: Tr(p) = 1. 

A quantum state p encodes the probabilities of the measurable properties, or 
"observables" (energy, position, ...) of the considered quantum system. Generally, 
in quantum mechanics the expected results of the measurements of an observable are 
not deterministic values but predictions about probability distributions, that is the 
probability of obtaining each of the possible outcomes when measuring an observable. 
An observable X is described by a self adjoint operator on the space of states H and 

dirnU. 

X = *« P a> 

a 

where the eigenvalues {x a } a of the observable X are real and P a is the projection 
onto the one dimensional space generated by the eigenvector of X corresponding to the 
eigenvalue x a . Then, when performing a measurement of the observable X of a quantum 
state p, the result is a random variable X with values in the set of the eigenvalues of 
the observable X. For a quantum system prepared in state p, X has the following 
probability distribution and expectation function 

F P (X = x a ) = Tr(P aP ) and E P (X) = Tr(Xp). 

An important element which affects the result of the measurement process is the purity 
of quantum states. A state is called pure if it cannot be represented as a mixture (convex 
combination) of other states, i.e., if it is an extreme point of the convex set of states. 
All other states are called mixed states. We give examples of states in Section [3j 

| (e.g. they speak about 'states" or "observable" instead of "laws" or "random variables"...) 
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1.1.2. Quantum optics In this paper, the quantum system we work with is a 
monochromatic light in a cavity described by a quantum harmonic oscillator. In the 
framework of quantum optics, the space of states is known to be the separable Hilbert 
space TL = L.2(IR), i.e. the space of square integrable complex valued functions on the 
real line. A particular orthonormal basis (^')j 6 N ~~ called the Fock basis - comes with 
this Hilbert space. This physically very meaningful basis is defined for all j G N as 
follows 

^(x) := — ^=H 3 {x)e-* 2 l\ (1) 

where Hj(x) := (— l)i e x2 -^e~ x2 is the j-th Hermite polynomial. In the Fock basis ([!]), 
a state is described by an infinite density matrix p = [pj,k]j,keW- 

We may give an equivalent representation for a quantum state p in terms of the 
associated Wigner function W p (see [5J). The Wigner function W p is a real function 
of two variables and may be defined by its Fourier transform J-" 2 with respect to both 
variables 

W p {u,v) := r 2 \W p }{u,v) = Tr(pexp(mQ + mP)), 

where Q and P are respectively the electric and magnetic fields. These two observables, 
we are concerned by, do not commute. As non-commuting observables, they may not 
be simultaneously measurable. Therefore, by performing measurements on (Q,P), we 
cannot get a probability density of the result (Q,P). However, for <ft G [0, 7r] we can 
measure the quadrature observables := Q cos + P sin <f), and then the above Wigner 
function plays the role of a quasi-probability density. It does not satisfy all the properties 
of a conventional probability density but satisfies boundedness properties unavailable for 
classical densities. For instance, the Wigner function can and normally does go negative 
for states which have no classical model. The Wigner function is such that 

• W p : R 2 R 

• / JW p (q,p)dqdp = 1, 

Furthermore, its Radon transform is always a probability density 

/oo 
W p (x cos <fi — t sin 0, x sin <fi + t cos <f))dt, (2) 
-oo 

with respect to -A, A being the Lebesgue measure onlx [0, n]. 

Now we can make explicit the links between the state p and the Radon transform p p (x\<j)) 
of the Wigner function W p associated to p. In the Fock basis ([T]), the entries pj^ of the 
infinite density matrix p are given by 

Pj,k = \J J\ P (x\(P)U k (x)e-^ k -^d^dx (3) 

for all j, k G N. The functions fj^ = f^j, in the expression ([3]), are bounded real 
functions commonly called pattern functions in quantum homodyne literature. A 
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concrete expression for their Fourier transform fkj using Laguerre polynomials L c r 
is ( cf |6J): for j > k, 



f k>j (t) = n^y-^ — ltlt^e-^L^i-). (4) 

We recall that the Laguerre polynomial of degree n and order a is defined by 

d n 

L a (x) := (n\)- l e x x- a — (e- x x n+a ). 

dx n 

1.1.3. Quantum Homodyne Tomography In this paper, we address the problem of 
reconstructing the density matrix p of a monochromatic light in a cavity. As the ob- 
servables Q and P cannot be measured simultaneously, we measure the quadrature 
:= Q cos (ft + P sin 0, where <fi G [0, n]. Each of these quadratures could be measured 
on a laser beam by a technique put in practice for the first time in [7] and called Quan- 
tum Homodyne Tomography (QHT). The theoretical foundation of quantum homodyne 
tomography was outlined in [8|. 

The experimental set-up, described in Figure [TJ consists of mixing the cavity pulse 
prepared in state p with an additional laser of high intensity \z\ » 1 called the local 
oscillator. After the mixing, the beam is split again and each of the two emerging beams 
is measured by one of the two photodetectors which give integrated currents I\ and I2 
proportional to the number of photons. The result of the measurement is produced 
by taking the difference of the two currents and rescaling it by the intensity \z\. Just 
before the mixing the experimentalist may choose the phase $ of the local oscillator, 
randomly, uniformly distributed on [0, 7r]. In the case of noiseless measurement and for 
a phase $ = 0, the result = ^jfp 1 has density p p {x\<p) corresponding to measuring X^. 

In practice, a number of photons fails to be detected. These losses may be quantified 
by one single coefficient 77 G [0,1], such that 77 = when there is no detection and 
77 = 1 corresponds to the ideal case (no loss). The physicists argue, that their machines 
actually have high detection efficiency, around 0.8/0.9. Thus, we suppose r] known. As 
the detection process is inefficient, an independent gaussian noise interferes additively 
with the ideal data X^. Thus for $ = <p, the effective result of the QHT measurement 
(Figure [T]) is for a known efficiency 77 g]0.5, 1], 



Y = JfjX ll> +y/(l-r,)/2Z 
where £ is a standard Gaussian random variable, independent of X^. 

1.2. Statistical model 



This paper aims at reconstructing the density matrix of a monochromatic light in a 
cavity prepared in state p. As we cannot measure precisely the quantum state in a single 
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Figure 1. QHT measurement scheme 
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experiment, we perform measurements on n independent identically prepared quantum 
systems. The measurement carried out on each of the n systems in state p is done by 



QHT as described in Section |1.1.3| In the ideal setting, the results of such experiments 
would be n independent identically distributed random variables (Xi, $1), . . . , (X n , $ n ) 
with values in M x [0, 7r] and distribution P p having density with respect to A, (A being 
the Lebesgue measure on Rx [0, 7r]) equal to 

p p (x,(f>) = -p p (x\<f>) = -K\W p ](x,<f>), (5) 

7T 71 

where 1Z is the Radon transform defined in equation ([2]). As underlined in Section 1.1.3 
we do not observe (Xg, &e)e=i t ... n but the noisy version (Yg, ^e)i=i,...n where 



Y e :=y/jjX e + y/(l-ri)/2S e . 



(6) 



Here £/s are independent standard Gaussian random variables, independent of all 
(Xi, <&£), I = 1, . . . , n. The detection efficiency 77 G]0.5, 1] is a known parameter and 
1 — f] represents the proportion of photons which are not detected due to various losses 
in the measurement process. 



Let us denote by p^(y, (ft) the density of (Ye, 
Pp(-|0) is the convolution of the density 



centered Gaussian distribution having variance (1 



Then, for $ = <ft, the conditional density 
of s/r]X with N v the density of a 
77)/2, that is 



p%y\4>) 



* W (y) 



1 



y -x t 



\<P N v (x)dx. 



(7) 



For $ 
is 



a useful equation in the Fourier domain, deduced by the previous relation 
HVVP v P (-Vv\mt) = Fil Pp (-\<P)](t)N~<(t), (8) 
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where T\ denotes the Fourier transform with respect to the first variable and N v (t) = 
e 4r > is the Fourier transform of N^(x), the density of a centered Gaussian density 
having variance (1 — T])/2r] = 7. 

In order to estimate the elements of the density matrix defined in (|3| from the data 
(Ye, $£)t=i r .. n , we define a realistic class of quantum states lZ(C,B,r). For C > 1, 
B > and < r < 2, the class TZ(C, B, r) is defined as follow 

lZ(C,B,r) := {p quantum state : \p m , n \ < Cexp(— B(m + n) r ^ 2 )}. (9) 

Note that the class TZ(C, B, r) has been translated in terms of Wigner functions in |9], 
where it has been proved that the fast decay of the elements of the density matrix im- 
plies both rapid decay of the Wigner function and of its Fourier transform. 

However, on the contrary to previous works, we do not assume here that the constants 
r, B and C are known. From now on we denote by (•, ■) and || ■ || the usual Euclidean 
scalar product and norm. 

1.3. Outline of the results 

This paper deals with the problem of adaptive estimation of density matrix p in QHT 
when taking into account the detection losses occurring in the measurement, leading to 
an additional Gaussian noise in the measurement data. In order to compute the per- 
formance of our procedure in L 2 risk, we defined in previous section a realistic class of 
quantum states TZ(C, B,r) in which the elements of the density matrix decrease rapidly. 
From the physical point of view, all the states which have been produced in the labo- 
ratory up to date belong to such a class, and a more detailed argument can be found 
in [10], as to why this assumption is realistic and in [9] as how to translate this class in 
terms of associated Wigner functions. 

The problem of reconstructing the quantum state of a light beam has been extensively 
studied in quantum statistics and physical literature. Methods for reconstructing a 
quantum state are based on the estimation of either the density matrix p or the Wigner 
function W p . 

The estimation of the density matrix from averages of data has been considered in the 
framework of ideal detection (rj = 1) in [TTj [T2| [T3l IT3] . Max-likelihood methods have 
been studied in [151 [HJ EE [E] and procedure using adaptive tomographic kernels to 
minimize the variance has been proposed in [18J. In a more general case of an efficiency 
parameter rj belonging to the interval ]l/2, 1], the estimation of the density matrix of a 
quantum state of light has been discussed in |T9| IT6j 120] and considered in [21 J via the 
pattern functions for the diagonal elements. The problem of goodness-of-fit testing in 
quantum statistics has been considered in |22| . In this noisy setting, the latter paper 
derived a testing procedure from a projection-type estimator where the projection is 
done in L 2 distance on some suitably chosen pattern functions. 
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For the problem of pointwise estimation of the Wigner function, we mention the work 
[23] in the case of ideal detection, that corresponds to r\ — 1, where a kernel estimator is 
given and its sharp minimax optimality over a class of Wigner functions characterised by 
their smoothness is established. The same problem in the noisy setting rj G]l/2, 1] was 
treated in |1U| . where the minimax rates were obtained. The estimation of a quadratic 
functional of the Wigner function, as an estimator of the purity, was explored in |24| . 
Recently, the more general case rj g]0, 1] was investigated in [S]. The authors provided 
rates of convergence in L2 loss for both an estimator of the Wigner function and an es- 
timator of the density matrix. Interestingly, the rates are polynomial in the case r = 2, 
whereas they are intermediate for r g]0,2[, where intermediate means that they are 
slower than any power of n but faster than any power of logrt. However, the physicists 
argue, that their machines actually have high detection efficiency, around 0.9. So we 
do not deal in this paper with values of 77 smaller than 1/2. It is to be noted that 
the estimator proposed in [9j depends on the knowledge of B and r. This is a serious 
limitation since in practice, one will face situations where one wants to reconstruct a 
density matrix without assuming the knowledge of B and r. This is known in statis- 
tics as "adaptive estimation". In the present work, we tackle the problem of adaptive 
estimation over the classes of quantum states {TZ{C, B, r)}. Our estimator is actually a 
soft-thresholded version of the estimator in [9J which allows us to reach adaptation. 
Coefficients thresholding is now a classical tool in statistics. It was introduced in a se- 
ries of papers |25| [26| [27] in the context of function estimation via wavelets coefficients. 
We refer to [28j for a comprehensive introduction to thresholding and waveltes. These 
methods were extended to inverse problems [291 130| l31| 132] , see [33] for an introduction 
and a survey of the most recent results. 

The remainder of the article is organized as follows. In Section [2j we present our adaptive 
thresholding procedure and state our main theoretical results. In particular, we establish 
upper bounds on the L2 risk of our procedure and its achieves the convergence rates 
over a broad family of set lZ(C,B,r) which have been obtained in [9]. These bounds 
are nonasymptotic and hold true with large probability. The theoretical investigation is 
complemented by numerical experiments reported in Section [3] The proofs of the main 
results are defered to the Appendix. 

2. Density matrix estimation 

We assume now n independent identically distributed random pairs (Yj, $j)i=i,..., n are 
observed, where $1 is uniformly distributed in [0, n] and the conditional density of Y\ 
given $1 is p^, cf (|7]). The goal is to estimate the density matrix [pj,k]j,k defined by 
(J3J) and to investigate the convergence rate of the proposed estimator. To achieve this 
goal, we follow the framework of [9J by assuming that the quantum state p is in some 
class TZ(C, B, r) defined in ([9]). The notable difference of the present setting is that the 
precise knowledge of C, B and r are not required by our estimating procedure. 
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2.1. Adapted pattern functions 

In order to reconstruct the entries of the density matrix from the noisy observations 
(Ye, $g) by a projection type estimator on the pattern functions, we have to adapt the 
pattern functions as follows. From now on, we shall use the notation 7 = 7(77) := 
We denote by f^- the function which has the following Fourier transform: 

f~W ■■= fkAt)e^ 2 , (10) 
where fkj are the pattern functions defined in equation Ml). 



2.2. Estimation procedure 

The estimation procedure we introduce in this section will depend on one tuning 
parameter iV := N(n), the precise value of which will be given later. We define the 
set of indices J(N) C N 2 by 

J(N) := {{j, k) G N 2 , < j + k < N - 1}. (11) 

We first define an initial estimator p v of p by setting 

p l= S i ELi G hk {% $,) v(j, k) e j(jv), (12) 

otherwise, 



where {Gj^)^ k are constructed using the pattern functions in (10) and 

G&M) (13) 

Note that this procedure introduced by [9j estimates the matrix coefficients by replacing 
the theoritical by its empirical conterpart. To define our final procedure of estimation, 
let us introduce some notation. From now, we denote by ||.||oo the supremum norm for 
functions, i.e. for any /, 

. = sup |/(:e) I . 



Let e G (0, 1) be a prescribed tolerance level. The final estimation procedure applies 
the soft-thresholding operator to the initial one : 

(14) 

\Pj,k\ 

with the convention 0/0 = 0, and where the thresholds are defined as 



tj,k - 2 H/jJjfe||«A 



log (^f±^) 



n 

Thus, our estimator of the density matrix is given by 



(15) 



P V = iPU,k. 
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2.3. Main results 

To characterize the behaviour of the estimator p v , we measure the quality of estimation 
in ^2-norm. For any density matrix v = {vj,k)j,k>o, we define the ^2-norm of v as 



M2 



1^1 

j,k>0 



We first state a risk bound that holds with large probability and will allow us to obtain 
the rates of convergence on the classes TZ(C, B, r). 

Proposition 2.1 With probability at least 1 — e, we have 



1 (j,fc)e/ (j,fc)£/ 



J,fc 



where the set J(N) is defined in (11). 



The proof is given in the Appendix A Note that this result holds true for any value of 
the tuning parameter N. Choosing this parameter in a suitable manner leads to a rate 
of convergence that coincides with the one obtained in [U] for a nonadaptive procedure. 
This result is stated in the following Theorem. 



Theorem 2.1 Let us put r G (0, 2), B > and let us choose 

'log(n) 



N = N(n) :-- 



2B 



(16) 



where \_x\ denotes the integer part of x such that [x\ < x < \_x\ + 1. Let us assume 
that p G lZ(C,B,r), for some unknown C > 1, B > Bq, r G [ro,2]. Then, there are 
constants Ci,C2,Cz > such that with probability at least 1 — e, we have 

• For i] = 1 and r G [r , 2] 

\\p v — p\\ 2 < Cin~ (logfn)) 57 log (log(n)£~ J . 



For n G (|, 1) and r = 2 



2 < C 2 n 4 ^+ s flog(n) + (log(n)) 1/3 log (log(ra)e -1 )) . 



For T] G (|, 1) and r G (r , 2) 



|^-p|| 2 2 <C 3 e- 2BM W r/2 (log 



n 



,2-r/2 



+ log(n) 1/3 log (bg(n)e *)) , 
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where M(n) satisfies 8^M(n) + 2BM(n) r ^ 2 = log(n). In particular, note that 



M(n) 



log(n) 



2B 



log(n) r//2 + o(log(n 



The proof is given in the Appendix B Let us give some comments on this result 
highlighting its relations to previous work. First of all, note that the convergence rate is 
polynomial in the cases (rj, r) e {2} x [r , 2] and (rj, r) e (1/2, 1) x {2}. Furthermore, the 
rate is parametric, up to a logarithmic factor, in the first case. It is slower in the second 
case, but becomes closer to the parametric rate when B is very large. The benefits of 
the adaptation are particularly striking in this case. Indeed, if, for example, the only 
available information is that B > 1/3 and rj = 3/4, then the estimator proposed in [9] 
will converge at the rate n 1 / 2 log n, even if the true state p belongs to the class 1Z(C, B, 2) 
with a very large constant B > 1/3. Contrarily to this, our estimator will converge at 

_ 3-B - 

the rate n 1 + 3S logn which can be very close to the parametric rate n if B is large. 
One can also note that when (i],r) G (1/2, 1) x (r ,2), the rate we get is slower than 
any power of n' 1 , but faster that any power of (logra) -1 . We will say that these rates 
are intermediate. They coincide, up to a log(log(n) je) factor, with the rates obtained 
in [£l El]. Another interesting feature of the previous result is that it provides a risk 
bound with high probability, whereas existing results are all concerned with bounding 
the expected risk. 

Interestingly, the same procedure achieves the nearly parametric rate in the case of pure 
state as well. 



Theorem 2.2 Under the same choice for N in Theorem \2.1 

login 



N = N(n) :-- 



2Bn 



if p is a pure state, i.e., if Pj j = 1 for some jo and all the other Pj^'s are 0. Then we 
have, as soon as N > max(j ,2), with probability at least 1 — e, 



Ph < 



64 

nr 



II/: 



2 , i' 2 ^g(n) 



JO JO I loo 



log 



B e 



The proof is given in the |Appendix C 
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3. Experimental evaluation 

3.1. Examples considered in the experiments 

We present in Table [T] examples of pure quantum states, which can be created at this 
moment in laboratory and belong to the class 1Z(C, B, r) with r = 2. Table [I] gives also 
their density matrix coefficients pj t j. and probability densities p p (x\4>). 
Among the pure states we consider the vacuum state, which is the pure state of zero 
photons. Note that the vacuum state would provide a random variable of Gaussian 



probability density p p {x\(p) via the ideal measurement of QHT (see Section 1.1.3). That 
explains the Gaussian nature of the noise in the effective result of the QHT measure- 
ment. 

We consider also the single photon state which is the pure state of one photon and the 
coherent-qo state, which characterizes the laser pulse with the number of photons Pois- 
son distributed with an average of M photons. Remark that the well-known Schrddinger 
cat state is described by a linear superposition of two coherent vectors (see e.g. |35j). 



Table 1. Examples of quantum states 
Vacuum state 

• Po,o = 1 res t zero, 

Single photon state 

• pi 5 i = 1 rest zero, 

• p p {x\(f)) = 2x 2 e~ x2 1 ' \pn. 
Coherent-q state q e R 

• Pj , k = e-M 2 (q /V2y +k /VW; 

• p p {x\(j)) = exp(-(a; - g cos((p)) 2 ) / y/n . 

Thermal state (3 > 

• Psjk = S 3 k (l - e-fy-e", 

• p p {x\4>) = v /tanh(/3/2)/vr exp(-x 2 tanh(/3/2)). 
Schrddinger cat q > 

• pj t k = 2(go/ 'v / 2)- ?+fc / (vO!^K ex P(<?o/2) + exp(— ^q/2))), for j and k even, rest zero, 

• p p {x\4>) = (exp(-(x - g cos(0)) 2 ) + exp(-(a; + q cos(0)) 2 ) 

+2cos(2g o xsin(0)) exp(-x 2 - g 2 cos 2 (0))) / (2^(1 + exp(—q$))) . 



3.2. Pattern functions f T - \ 

Since there is no closed-form expression for the pattern functions fj k , we evaluate them 
numerically on a 1-D regular grid of Q = 4096 points. We use expressions Q and ([TO]) 
to evaluate fj^ and fj k on the 1-D frequency grid of Q discretized t points. The adapted 
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pattern functions f^ k are computed on the 1-D spacial grid of Q discretized x points by 
applying to fV k the inverse Fast Fourier Transform (FFT) in 0(Qlog(Q)) operations. 
Some pattern and adapted functions are depicted in Figure [2j 



Figure 2. Examples of pattern functions fj k (a) and adapted pattern functions fj k 
(b). 
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3. 3. Implementation of our procedure 



The deconvolved estimator p 1 - k defined in (12) is computed by evaluating 



G hk {x^) = fl k {x)e-'^ 

" f 'j,k 



at point x using a cubic spline interpolation of the values of f? k on the discrete grid of 
Q points. 



In the following section, we assess the performance of the threshold estimator (P- k . We 
perform this evaluation by creating noisy samples as defined in (|6]). The initial 
samples Xg are drawn from the distribution p p (x\(j)) (see Table [T]) using the rejection 



method. The value of iV = N(n) is set following (16). We use ro = 2 and Bq = 1/2 for all 
the numerical experiments. A toolbox that implements this procedure and reproduces 
all the figures of this article is available onlind^} In Figure |3j represents the density 
matrices p and the estimated density matrices p v of some quantum state. 



§ http://www.ceremade.dauphine.fr/ peyre/ codes / 
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Figure 3. First row: The density matrix p respectively of the coherent state, the 
chrodinger cat state and Thermal state. Following rows: estimated p v of previous 
states for Bo = 0.5, r\ = 0.9, e = 1 and n respectively equal to 10 x 10 3 (row #2), 
100 x 10 3 (row #3), 500 x 10 3 (row #4). 




(a) Coherent qo = 3 (b) Schrodinger cat qo = 3 (c) Thermal j3 = 1/4 
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3.4- Studies of the performance of our estimation procedure 

We estimate numerically the (relative) root mean square error (RMSE) 

RMSEH = ||p"-p|| 2 /||p|| 2 

of our soft thresholding estimator. More precisely, Figure [4] shows the evolution with n 
of the expected value of the RMSE. This expected value is evaluated by an empirical 
mean with Monte Carlo simulation using 50 replications for each value of n. To evaluate 
the deviation with respect to this mean, we also display the confidence interval at ±3 
times the standard deviation of the RMSE. 



Figure 4. Evolution of E(RMSE(n)) as a function of n for r\ — 0.9, e = 1 and 
N = 30. The blue shaded area represent the confidence interval at ±3 times the 
standard deviation of RMSE(n). 
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(c) Thermal ^ = 1/10 



(d) Thermal j3 = 1/4 



The threshold values tj^ that are used in ( 14 ) to define our estimator are somewhat con- 
servative. In practice, smaller values offer better decay of the RMSE. Figure [4] displays 
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in dashed red (resp. dashed green) the decay of the RMSE obtained using thresholds 
0.8£jfc (resp. 0.5tjk)- We found on these three examples and for rj = 0.9 that using 
0.5^ gives consistently the lowest RMSE among other choices of thresholds propor- 
tional to the tj t k values. 

We found numerically that the decay of the RMSE with n almost perfectly fits a 
power-law, which (up to logarithmic factor) is in accordance with the upper-bounds 



of Corollary 2.1 Following this Corollary in the setting 77 G (|, 1) and r = 2, we fit a 
power law of the form 

E(RMSE(n)) « n~^+B) . 



We perform a linear regression in a log-log domain to estimate B. Table |2j reports the 
estimated value of B we found using this procedure. 

Table 2. Estimated values of B when using rj = 0.9, £ = 1 and N = 30. 



Coherent qo = 3 


Schr 6 dinger cat go = 3 


Thermal = 1/10 


Thermal = 1/4 


B w 0.174 


B w 0.227 


5 w 0.037 


B w 0.082 
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Appendix A. Proof of Proposition |2.1 

The proofs follow the main lines of |5oj I3T]. First, we need a set of preliminary lemmas. 



Appendix A.l. Some preliminary results 

First, we remind Hoeffdig's inequality for bounded random variables. 

Lemma 1 Let us assume that Z\, Z n are independent real-valued random variables 
with di < \Zi\ < 6j. Then, for any A > ; 



P 



E i z * - E (^)] 



i=l 



> A < 2 exp 



2A 2 



As a consequence, we have the following inequality for complex random variables. 

Lemma 2 Let us assume that Z\, Z n are independent complex-valued random 
variables with \Z\\ < c. Then, for any t > 0, 



P 



i=l 



> t < 4 exp 



nt 2 \ 
Ic^J ' 



Proof: We have 



P 



E - E (^)] 



i=l 



> A < P 



ReE[^- E (^)] 



i=i 



> 



v/2. 



+ P 



i=l 



< P 



E[Re(Z i )-E(Re(Z i ))] 



8=1 



> 



> 



V2 

75. 



+ p 



E[Im(Z t )-E(Irm%))] 



i=i 



> 



Now, we apply Hoeffding's inequality to the random variables Re(Zj) which satisfy 
— c < Re(Zj) < c. So we have: 



P 



E[Re(Z i )-E(Re(Z i ))] 



i=i 



> 



v/2 



< 2 exp 



v/2 



\ 



Er=i(2c) 



2 exp 



—) 

Anc 2 J 



We have exactly the same result for Im(Zj) so finally: 



P 



8=1 

Put t = X/n to end the proof. □ 



> A < 4 exp 



A 2 



4nc 2 7 
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Lemma 3 For some fixed e G (0, 1), let us define the set 

n £ ■= {v(j, k) g j(n), \m k - Pjik \ < t jik } 



17 



where the (tj^)j,k are defined in (15) and the set J(N) is defined in (11). Then we have 

p(n e ) > 1 - e. 

Proof: Lemma [3] is proved by using Hoeffding's inequality. In this aim, we have to first 
notice 

E p[plk\ = Pj,h- 
Indeed, by using Q , Q, ([10]) and @, we have 

E P [pl k ) = E p [G hk {^n = E p [fl k {^)e-^} 



-i(j-k)<f>_ 



-i(j-k)<t> 



27T 
1 

2^ 



fj,k (v) Vvp v p (vVvlfy dydcf) 

fUt)^[VvP v P (-vv\<pm)dtd<p 
kkity it2 '■F l [ Vp {-\<p)](t)N\t)dtd<i ) 



f j:k (x)p p (x\(j))dxd(j) = pj- 



Moreover, we easily get from the definition of Gj ;k in (13) that for all I — 1 • • • , n and 

V(j,k)eJ(N) 



G 



< Wfl 



j,k\\°°- 



Then, for 



tj,k 



^ll/j^lloo"^ 



log 



2N(N+1) 

e 



and according to Lemma [2] 

P (\pl k ~ Pj,k\ > tj,k) < 4exp 



n 



M\fj,k\\oc 



2e 



N(N + 1) 



By the classical union bound argument: 



2s 



(i,fc)eJ(JV) 



{j,k)£J(N) 



N(N + 1) 



< E. 



□ 



Lemma 4 For some fixed e G (0, 1) and V(j, k) G J(N), with J(N) defined in (11), we 
define the set 

R £ jk : = \y density matrix, \u jjk - pl k \ < t jyk ] , 
where the (tj >k )j tk are defined in (15). Then, on the event f2 e defined in Lemma^ and 

V(j,k)eJ(N) ' 
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(i) P^Rlk- 

(ii) R e k is closed and convex set. 

(Hi) For U £ j k the orthogonal projection onto R £ - k and for any density matrix v, 
||p- n ;,fc(>)|l2 ^ \\P ~ u \\l- 



(A.l) 



Proof: The first point is just a consequence of Lemma [3} The second point comes from 
the definition of R £ k . 

Moreover, it is well known that for any closed and convex set C, if He is the orthogonal 
projection on C, the following property holds: 

WxeC,Vy, \\Tl c (y) - x\\ 2 < \\y - x\\ 2 . 

This concludes the proof of the third point. □ 



Lemma 5 For e £ (0, 1), any fixed (j, k) £ J(N), with J(N) defined in (11), and any 
density matrix v, we denote by u f the projection of v into R 6 j k , 

with Rj k defined in Lemma ^ Then, the entries is' e m of v' are equal to 



V, 



( I P\k ~ Vjjk | - tj, k ) , if (£, m) = (j, k) , 



v^m, otherwise, 
with the convention 0/0 = 0. 



-> 



Adaptive estimation of density matrix in QHT 19 
Proof: The projection v' of v into R e - k satisfies 

oo 

v' = arg min \\u - x\\ 2 2 = arg min } \xt m - ^, m | 2 . 

J J t,m=0 

As the constraint x G Rj k is only a constraint on Xj^, it is clear that for (£, m) ^ (j, k) 
the minimum is reached for Xj^ = Uj t k- Finally, 

^ = arg min |^-«^|'. 

•&j,k m \-^j,k Pj^\—i''j,k 

The solution i/ fc is obvious: 



+ |S fc -?.J ( I ~ ^i.* I - hk) otherwise 



n 7 ? — 7/ 

- + I ^ _ 1 U^J,* ~~ ^'> fc l ~ W + ■ 

\Pj,k 

This ends the proof. □ 

Definition 1 For m > an integer, let 

I:={(ji,k 1 ),...,{j m ,k m )}Cj(N) 



be a set of indices, where J(N) is the set defined in (Tl\ )- that W ^ i, (je, ki) ^ (ji, ki). 
Fore G (0, 1) and for any density matrix v, we denote by Tlf (y) the successive projections 
of v into spaces [Rj it kjj k , i- e - 

Note that for any set of indices / and from Lemma [5j the application of the successive 
projections ITf to a density matrix v does not depend on the order of the successive 
projections. 



Lemma 6 For e G (0, 1), for J(N) defined in (11) and for Jp defined in (14), we have 

pV = W J(N) (0), 

where is the zero-infinite matrix. 

Proof: This is obvious from the definition of fp and from Lemma [5] applied to v — 0. □ 
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Appendix A. 2. Proof of Proposition 2.1 



Proof: For J{N) the set of indices defined in QTTJ) , let I be a subset of J(N), I C J(iV). 
For a fixed e G (0,1), we have by Lemma M and by successive applications of the 



inequality( A.l ) to all pair of indices (j,k) ^ / 

Hp" -p\\1= \m(o) -pII1<I|ii?(o)-* 

Moreover, from Lemma [5] applied to v = 0, we get 



(nj(o)) 



otherwise. 



Therefore, from (A.2) we get 



j,fe=0 



P?i fc /|»IJ < \ 

P.7,fe 



(?,fc)e/ 

E i a ^i 2+ E m 

(j,k)ei {j,kW 



E 

{j,k)0 



where 



n' 

A _ "j>* /I «)) I i \ 

^i,fc - Pi.fc - -p^TT l|Pj,fc| _ t W + 

Pj,k, ti\Pj,k\ ^ *i,*> 

Pj,k - r§ri (\p],k | - hk) > otherwise. 



Moreover 



I P],k ~ PjM | + I P\ k I , if I Pjfc I < *i^fc > 



|P/,fc _ Pi A + tj,k, 



otherwise 



< pL - Pj,A + ^,fe- 



For any (j, k) E I and on the event f2 e defined in Lemma |3j it holds 

\Pj,k ~ Pj,k\ < hk- 



Therefore from (A. 3) 



IIp^-pII^ E ( 2 ^) 2 + E i^i 2 - 

We conclude the proof by taking the infimum over the set / C J(N). □ 



(A.2) 



(A.3) 
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Appendix B. Proof of Theorem 2.1 



Proof: For r G (0, 2), B > and JV as in flip] ), let M be an integer s.t. M < N. We 
define the set 

J(M) := {(j, fc)GN 2 ,0<j+K Af }. 



Then, for e G (0,1) and by applying Proposition 2.1 to I = J(M), with probability 
larger than 1 — e, we obtain 



\p n — p\\o < inf 

z 0<M<7V- 



E E M 2 } 

I 0<j+k<M j+k>M ) 



inf { — V ||f\||^log(2iV(iV + l)/ £ 



0<M<iV-l n 

0<j+k<M 



E M 2 } 



(b.i; 

j+/c>M 



For 1] = 1 and r G [r , 2] . 



As fj k = fj y k for 77 = 1, we have by pluging ( |D.1[ ) and (D.3) into (B.I) 



|p" -p|| 2 < _ inf 



0<M<7V- 



J^f E n^n 2 oiog(2iv(iv+i)/5)+ e m 2 } 

I 0<j+k<M j+k>M ) 



< ]llf log(A/e) + CM 2 -ie" 2BM5 } 



o<M<iv-i i. n 



for some constant c\ > 0. 



For JV such in (|16[ and by taking M = (\og(n)/2B) 2 / r < N, it leads to 

2 20 , 

Hp* 7 — p\\ 2 < Cilog (log(n)/e) (log(n)) 3r n 
for some constant Ci > 0. 



(B.2) 



b) For n G (1/2, 1) and r = 2. 



Next, we deal with the case 1 > r\ > 1/2. We plug ( |D.1[ ) and ( |D.2| into ( |B.1[ ) to 
obtain in the case r = 2 



I ~ri || 2 

\P V -Ph 



< inf (-log(A7£)M^ M + CMe- 2BM ) 

0<A/<iV In ) 



for some constant c 2 > 0. 
By taking M = M(n) s.t. 



M 



log(w) 
2(4 7 + 73)' 
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we obtain 

\\p v -p\\l< C 2 n-w^B (log (log(n)/e) (log(n)) 1/3 + log(n)) , 
for some constant C 2 > 0. 



c) For 77 G (1/2, 1) and r G (r , 2). 

Finally, in the case 77 G (1/2, 1) and r G (r ,2) and by plugging (D.l) and (D.2) into 



(B.l ) we get: 



\\~p^-p\\l< inf \^\og(N/e)Mh^ M + CM 2 -h- 2BMi ) , 
z o<m<n In J 

for some constant C3 > 0. 

For M a solution of the equation 87M + 2BM^ = log(n) and for 

M(n) = i- log(n) - j^^j- 2 log(n) r/2 + o(log(n) r / 2 ) 

in particular, we obtain 

- p\\l < C 3 exp- 2BMr/2 (log(n) 2 - r / 2 + log(n) 1 / 3 log (JV/e)) , 
for some constant C3 > 0. □ 



Appendix C. Proof of Theorem 2.2 



Proof: We apply Theorem 2.1 for / = {(jo, jo)}- We obtain, with probability larger 
than 1 — e, 

16 



77 



E n^ii« lo s + l )/^)+ E 



2 



(j,k)=(jo,jo) 



16 



||/ JOJO || 2 log(2iV(iV + l)/e) + 0. 



For n large enough, yV = iV(n) > 2. Then, (JV + 1) < 2N and 

1 6 

W-pWI <-|l/,o.ollLlog(4iV 2 / £ ) 
16 „. , |2 



n 
16 



||/ JOJO || 2 [21og(iV) + log(4/ £ )] 



— „ 1 1 /jo, jo 1 1 00 



n 



^o g (^) + lo g (4 /£ ) 



where we replaced iV by its definition. As r < 2, 4/r > 1 and we have the following 
rough upper bound: 



16 



WP-pWi <-!!/*. 



64 



Jo 1 1 00 



4 /log(ra)\ 4, , A . . 



II/, ' 21og(n) 



□ 
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Appendix D. Technical Lemmas 

Useful lemmas are the following. 

Lemma 7 For p G 1Z(C, B,r), the set defined in there exists a M s.t. VM > M 
implies 

E \ Pj , k \ 2 <CM^e- 2BM \ (D.l) 

j+k>M 



2 



where C — tt ■ 

Br 

Proof: For p G TZ(C,B,r), we have by the definition of the class lZ{C,B,r) and by 
Lemma 3 in [9j 

~Br~ 



E M 2 <^ E exp(-25( J + A;r/ 2 )<^M 2 -ie- 2 ^^ 



j+k>M j+k>M 
□ 

Lemma 8 For 7] G (1/2, 1), i/iere exists a positive constant > s.t. 

E H/ZX < C^Mh^\ (D.2) 
o<i+fe<A/ 

where 7 = (1 — n)/(4r]) and the (fj k )j,k are the adapted pattern functions defined in 
expression |7Z| ). 

There exists a positive constant > s.t. 

E H/i.*H» < C-Mf , (D.3) 
o<i+fe<A/ 

where the (fj t k)j,k are the pattern functions defined in expression Q). 

Proof: For the proof of this lemma, we refer to Lemma 4 and Lemma 5 in [9J. □ 
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